A neural network model of a quasiperiodic elliptically polarizing undulator in universal mode

An approach based on machine learning to produce a fast-executing model is introduced that predicts the polarization and energy of the radiated light produced at an insertion device.


Introduction
For decades, synchrotron light source facilities have produced highly brilliant and tunable photon beams for experiments across many scientific disciplines, in particular through the use of insertion devices (IDs). At the Canadian Light Source, the Quantum Materials Spectroscopy Center (QMSC) beamline uses an elliptically polarizing undulator (EPU) type ID with a magnetic period of 180 mm to produce soft X-rays with variable polarization in the energy range 15-200 eV.
In materials science, having the ability to probe the orbital structure of electronic states with linear and circular dichroism measurements is critical to understanding the underlying physics in the system under study. Angle-resolved photoemission is one technique that can extract additional information from a sample by utilizing arbitrary polarization at low photon energies (Day et al., 2019). However, 100% circular polarization is difficult to achieve due to the beamline optics altering the polarization of low-energy photons as they propagate from the ID to the experiment endstation (Wurtz et al., 2014;Marcouille et al., 2007). This introduces the requirement for arbitrary polarization of the light at the EPU, along with the corresponding requirement of knowing the EPU operating parameters that will deliver photons of a certain energy and polarization on demand.
A planar ID has its gap as one degree of freedom. In this case it is straightforward to build a one-dimensional look-up table relating the energy of the radiated photon beam to the device gap, where the look-up table is typically generated from magnetic or beam-based measurements. Operating an EPU in arbitrary polarization requires a multi-dimensional look-up table to relate its parameters to the energy and polarization state of the photon beam. Moreover, the overall system may drift over months or years, for example due to changes in characteristics of the undulator or the beamline optics. Look-up tables built from measured data are limited to replacing their data one point in the ID's configuration space at a time, and hence the total time necessary to (re-)measure data for multi-dimensional look-up tables from either beam-based or magnetic data becomes prohibitively large. The measurement time can be sidestepped by instead computing the undulator output polarization at any arbitrary point in configuration space from a model, for example using RADIA (Elleaume et al., 1997;. However, such calculations remain time-consuming and the result is then limited by the accuracy of the model. The most attractive outcome is a fastexecuting model that can be calibrated from a measured data set that is small compared with the size of a multi-dimensional look-up table. In this article we propose that neural networks can be just such a model, providing rapid accurate predictions of the beam characteristics from a complex undulator.

Elliptically polarizing undulators and polarization
The QMSC undulator is a quasiperiodic APPLE-II type EPU. A section of its modelled magnet arrays is shown in Fig. 1. Certain magnet blocks are offset vertically to incorporate a quasiperiodic magnetic structure, which reduces contamination of the harmonics present in the undulator spectrum . Gap adjustments symmetrically change the vertical distance between the upper and lower magnet arrays. Independent longitudinal motion of the four girders are described with two independent parameters for operating the device, called the elliptical phase ' E and composite linear phase ' L (Sigrist et al., 2019). By adjusting these three operating parameters (gap, ' E , ' L ), the strength and orientation of the undulator's magnetic field can be controlled, which in turn controls the energy and polarization of the radiated photons.
The polarization of the light radiated from the EPU can be described using the Stokes parameters S1, S2 and S3. For this application, the Stokes parameters are normalized and dimensionless, satisfying equation (1), where each parameter ranges from À 1 to 1,

Machine learning
Machine learning (ML) techniques have been studied for various particle accelerator applications. Recently, ML-based surrogate models have obtained accurate and fast-executing representations of the relevant beam dynamics from a sparse sampling of the physics simulation (Edelen et al., 2020). Neural networks (NN), a sub-type of ML, have been trained to automatically tune and control large complex systems such as particle accelerators and insertion devices (Leemann et al., 2019;Scheinker et al., 2019). Their ability to be trained off-line using simulation data from computationally expensive codes and updated with measurement data has been demonstrated for multiple applications (Edelen et al., 2010(Edelen et al., , 2020. This type of ML algorithm is referred to as supervised learning because the model is trained on labelled data sets. In this sense, ground truth outputs exist for each input (Arpaia et al., 2021). In contrast to the simulation software from which ML models are trained, ML models can execute in fractions of a second with comparable accuracy in predicting the resulting beam parameters (Edelen et al., 2020). Additionally, the ability of ML models to be updated with new measurement data ensures that they remain accurate as the characteristics of the modelled device changes (Edelen et al., 2020).
With these advantages in mind, accomplishing the objective of this work entails acquiring a large training data set from simulations. ML models are able to learn complex nonlinear relationships using large amounts of training data; however, producing a large training data set is computationally expensive (Leemann et al., 2019). In practice, the training data size depends on the complexity of the problem and complexity of the ML algorithm. Similar ML scenarios determined the amount of training data required by empirically evaluating the performance of their models with respect to the number of data points (Edelen et al., 2020). This technique was used to determine the size of the required training data set. By varying the resolution of the ID settings in the training data, the size of the data set would change without affecting the equal representation of the operating modes of the ID within the data. The difficulty for ML models to interpolate between training points increases for complex, many-parameter systems (Scheinker et al., 2019); therefore the data size was chosen such that the ID settings have sub-millimetre resolutions.

Modelling the undulator as a periodic device
As an initial proof of concept for this work, training data were generated from a RADIA model of the undulator built as a periodic device. In this simplified case, the photon beam characteristics are derived from the undulator's effective and nominal fields (Sigrist et al., 2019) using equations (2), (5), (6) and (7) (Sigrist, 2018).
The effective field is an approximation of the undulator's peak field,B B, and is obtained via Fourier series decomposition of the modelled field profiles, B x (y) and B z (y). An example of one of these magnetic field profiles is shown in Fig. 2. Equation (3) shows the Fourier series decomposition over harmonic i. The effective field is specific to an EPU's Section of the QMSC insertion device (EPU 180 mm) to illustrate the magnet array.
(gap, ' E , ' L ) settings. Nominal fields are gap-dependent phase-independent terms, as per equation (4). B z0 is given by B zeff at a horizontal polarization and similarly B x0 by B xeff at a vertical polarization, such as ' E = AE /2 (see Table 1 for a list of the variables), However, describing the field profile in terms of effective fields introduces an approximation that holds poorly for quasiperiodic undulators. This point is illustrated in Fig. 3, which shows modelled undulator fields and their Fourierdetermined effective equivalents for two cases. The upper plot shows a periodic undulator with a 55 mm period, where the effective field closely matches the undulator field; the lower plot shows the 180 mm quasiperiodic device under consideration, where the effective and undulator fields do not match. Calculating photon energy for the n = 1 harmonic from the effective field for this configuration yields 10.7 eV, whereas a more direct calculation (see next section) yields 9.4 eV. These results differ by 12%, which highlights the inapplicability of Fourier decomposition for studying quasiperiodic fields.

Modelling the undulator as a quasiperiodic device
The photon beam characteristics can be determined without the approximation inherent to the effective field. This is achieved by modelling the undulator in its quasiperiodic configuration and exporting magnetic field data for analysis in the Synchrotron Radiation Workshop (SRW) code . Undulator radiation spectra are calculated at an observation window 8 mm by 8 mm in size and 18 m downstream of the undulator. The calculation uses a nonfilament electron beam defined for a straight section in the Canadian Light Source (CLS) storage ring; see Table 2 for the beam characteristics. The spectra are calculated separately for the total (S0), horizontal (0 ), vertical (90 ), inclined linear (45 , 135 ) and left-and right-circular polarizations. Stokes parameters are then obtained by comparing the flux at the n = 1 harmonic for the different polarizations.
Scripting was developed in IGOR Pro to generate the large data set for training ML models (Wavemetrics, 2018). The script can import and process magnetic field data for any number of EPU configurations. For each configuration, undulator radiation spectra are computed across an energy range near the undulator's n = 1 harmonic; the expected energy is calculated using Fourier-determined effective fields. The precise photon energy of the n = 1 undulator harmonic is determined by fitting a curve to the total photon flux. The photon beam characteristics and their corresponding ID settings describe a single case for the ML model.  Table 1 Magnetic field terms and parameters (Sigrist, 2018).

Symbol
Term Unit Nominal horizontal, vertical field T

Figure 3
Magnetic field profile (solid) and effective field (dashed) calculated from a Fourier expansion thereof. Top: 55 mm periodic EPU. Bottom: 180 mm quasiperiodic EPU.

Figure 2
Magnetic field profile in EPU 180 mm, modelled as a periodic device, calculated at gap = 15 mm, E = 0 mm and L = À 60 mm.
Two example sets of undulator spectra are shown in Fig. 4. The script's 'information pipeline' and overall procedure for training an ML model is illustrated in Fig. 5 (Edelen et al., 2020).
Lastly, it is important to note that this methodology amounts to training an ML model based on the output of a RADIA model. Prior to this work, the RADIA model was refined with bench-based magnetic measurements of the actual undulator using a Hall probe and flipping coil setup. The RADIA model's tuning process considered 45 EPU configurations, with priority given to planar and vertical polarization modes across various gap settings. Across the considered configurations, the typical relative difference between modelled magnetic fields compared with benchbased measurements on-and off-axis is 1%.

Developing a neural network model
A neural network model was created to predict four outputs, namely the photon beam energy and Stokes parameters S1, S2, S3. A neural network is composed of individual neurons that accept multiple inputs and produce a single output. These neurons are arranged in layers to form a connected network (Smith, 1997). The developed neural network is a feedforward network in that the data propagate from input to output without looping between intermediate layers. The created model is hereafter referred to as NN4, as its final layer has four output neurons corresponding to the beam parameters. The neural network was implemented using Keras with Tensorflow 2.0 backend and open-source scikit-learn packages (Abadi et al., 2016;Pedregosa et al., 2011).
The architecture of the neural network is a four hidden layer (128-64-32-16), fully connected neural network with a rectified linear unit activation function for each layer. The model was trained using backpropagation with the Adam optimizer (LeCun et al., 1989). The mean squared error (MSE) was used both as a loss function and metric to monitor the performance of the model; the MSE compares the model output, namely scaled photon energy and Stokes parameters, to the training data. The neural network model used scaled inputs in the range (0, 1) and scaled outputs in the range (À1, +1). The data set not used for training is divided equally, resulting in a 60-20-20 split of the training, validation and testing data, respectively.
Although a deep (many hidden layers) and wide (many nodes per layer) NN generally provides better fitting on training data, it is prone to overfitting (Leemann et al., 2019). This issue was minimized by shuffling the data, implementing a learning schedule, adjusting the number of epochs (number of times the model is trained on a subset of data), and adjusting the batch size (the subset data size shown during training).
The simulated training data for the NN model contained 4175 cases that sampled the EPU's operating modes: planar, vertical ( E and L ), circular (helicity 1 and 2), elliptical, linear, inclined (helicity 1 and 2), and a selection of universal modes near circular at photon energies of interest to the beamline. To cover the total configuration space of the device, an additional 1000 cases were randomly generated for each quadrant formed by E and L , for a total of 8175 cases. The 4175 and 8175 case data sets are shown in Figs. 6 and 7, respectively. A single operating mode, E45 : L45 ( E = 45 mm, L = 45 mm, gap = 15 mm), was not included in the smaller data set for reasons explained in Section 5.
The train/test splitting technique for sorting the simulated data was employed with the ML models. This method entails dividing the data so that one group is used to train the model and a separate group is used to test the model. This compu- Radiation spectra calculated for various polarization modes in SRW for two configurations of the quasiperiodic RADIA model. The configurations are at different gaps and are both in planar polarization ( E = L = 0 mm).

Figure 5
The procedure for training an ML model using the physics simulation as the ground truth (Edelen et al., 2020). The ML model parameters are dynamically updated during training. The hyperparameters of the model are manually adjusted until the model reaches a target performance. tationally efficient approach was suitable because the data were shuffled prior to sorting, thereby guaranteeing the configuration space of the device was equally represented in the training and testing data sets. An appropriate distribution of the training and testing data is critical for effectively evaluating model performance. Note that this approach is equivalent to performing a k-fold cross-validation procedure with k = 2 (Stone, 1974). The unscaled inputs occupy the following ranges: gap = 15 mm to 200 mm; E and L = À 90 mm to 90 mm. The unscaled outputs occupy the following ranges: E = 6 eV to 400 eV, Stokes parameters = À 1 to 1.

Predicting modelled EPU beam characteristics
The following results are drawn from the model's performance on the simulated data sets for the EPU. The model used batch sizes of 16, a customized decaying learning rate schedule, and trained for 1500 epochs.

Configuration space
The first iteration of the NN model was trained on the simulated operating-modes data set, totalling 4175 unique cases. After testing the model on cases from the measured data set, it was apparent that the model did not generalize well to domains in configuration space not covered in the training data. In particular, a single operating mode (E45 : L45) contained in the measured data set was not encompassed by the domain of the simulated operating-modes data set (Fig. 6) used to train the model. This E45 : L45 case was intentionally set aside from the simulated operating-mode data set to observe the model's ability to extrapolate for new domains. The predictions made by the NN model on the measured data set are shown in Fig. 8 to demonstrate how the single E45 : L45 case stands apart from other predicted cases. Although strict agreement between the predicted and test cases is not expected because the simulated and measured data sets are inherently unique, general agreement is expected.
To test the prediction that the model requires training on each domain for which it will be tested, the model was trained on the operating-modes data set (4175 cases) with one additional quadrant of randomly generated data. The model accurately predicted the E45 : L45 case when the extra quadrant data encompassed the E45 : L45 case and poorly otherwise.
Since we desire a ML model that may be used to predict the EPU beam characteristics for any operating mode, current or future, the second iteration of training the model was performed on the complete data set shown in Fig. 7, which will be referred to as the simulated data set from now on.

Results
The ML model was evaluated based on its MSE, mean norm of the Stokes error vector (MSEV) shown in equation (8)    Total distribution of simulated data. The randomly generated cases are represented as black points to distinguish them from the initial 4175 cases.

Figure 8
Comparison between predicted and test output Stokes parameters for measured data from the NN. Excellent agreement is shown for the majority of test cases except for the E45 : L45 case (predicted S1 value of $ 0.26 versus a test value of $ À0.21; predicted S2 value of $ 0.8 versus a test value of $ 0.97; predicted S3 value of $ 0.31 versus a test value of $ 0). energies and whether it satisfied the QMSC beamline's error threshold shown in equation (9), Equation (8) uses the Stokes parameters in vector notation where S pred and S true are the predicted and target Stokes vectors, respectively. The MSEV then represents the norm of the Stokes error vector averaged over n test cases, Equation (9) states that, for operation in circular mode, |S3| = 1, the residual components of the Stokes vector must be less than a threshold of 0.03. To evaluate how well the model generalized to the input configuration space, the predictions made by the model were compared with the target values. This step is performed using the testing data, which is 'unseen' by the model during its training. The NN model was compiled 30 times to establish its average performance. A summary of the model's performance at predicting EPU beam characteristics from simulated data is given in Table 3.
The QMSC beamline's error threshold was satisfied by the model; see Fig. 9. The predicted residuals follow the same trend (magnitude and frequency) as the residuals from the test values. Cases were selected for inclusion in Fig. 9 using the tolerance |S3| = 1 AE 0.01. In the majority of such cases, the undulator linear phase is zero and gap and elliptical phase are coordinated; these are the typical usage cases and form the first peak near zero residual. The selected cases also include randomly generated undulator configurations where a small linear phase setting may inflate the residual; these cases form the second peak near 0.08.
The accuracy of the 1633 predictions made by the NN4 model on the test data is shown in Table 4. A regression score, R 2 , is calculated for each output beam characteristic to indicate the correlation between predicted and test values. The variances of the relative errors from the predicted test cases of the NN4 model are included to represent the distribution of errors. The averaged MAPE indicates that the NN4 model predicted the photon energy within 2.80%. The near-unity R 2 values for each output indicate that the NN4 model accurately predicts the EPU beam characteristics. The small variances of the relative errors in predictions indicate that the predictions made by the NN4 model are tightly distributed around the mean (zero).

Updating the neural network model using transfer learning
The ability of an ML model to be updated with new data, as mentioned in Section 1, was investigated to determine whether the model could predict the Stokes parameters derived from the magnet measurement data. The limited measured data mentioned in Section 3 were used to update the ML model.
Since the beam characteristics are only slightly different between the simulated and measured data sets, and the measured data set is small, the calibrated neural network model (Calibrated NN) used the entire NN4 model as the base model. This methodology involving the bottleneck layer of a trained model in transfer learning applications has been demonstrated by several computer vision works (Wang et al., 2020). Since the bench-based magnetic measurements capture a small subset of the EPU configuration space, the updated model fits the measured data better with fewer trainable parameters. The measured data set contains 169 cases that proportionally represent the operating modes of the device; the 45 cases used to tune the RADIA model are a subset of this data set. The Calibrated NN model was trained on 60% of these data (101 cases) and tested on the remaining 40% (68 cases). The model used batch sizes of 4, a customized decaying learning rate schedule, and trained for 300 epochs. Table 3 The model's performance in predicting EPU beam characteristics.
The metrics in this table are computed on the scaled predicted values which lie in the range (À1 to 1). The metrics of the best performing model from the compilation are included to illustrate the model's performance potential.

Figure 9
Histogram comparing the residuals for predictions on |S3| = 1 cases by the NN4 model with the residuals from the test data set. These cases correspond to cases where the test value of |S3| = 1 within a tolerance of AE 0.01. This updated neural network (Calibrated NN) adds one additional layer, identical in structure to the base model output layer (size 4, fully connected, using the linear activation function). A diagram is provided in Fig. 10 to show the architecture of the Calibrated NN model. Similar techniques to those described in Section 4 were employed to optimize the Calibrated NN model.

Transfer learning results
The Calibrated NN model was evaluated with the same metrics as described in Section 5, although it is now evaluated on the measured data set. The Calibrated NN model was compiled 60 times to establish an average performance. It is important to note that the performance of the Calibrated NN model may only be compared with the NN4 model when they are evaluated on similar data sets. To that end, the performance of the NN4 model was averaged over 60 trials and evaluated on the measured data set, rather than the simulated data on which it was trained. For comparison, a separate neural network model was also created and trained solely on the measured data. This Limited NN model was optimized to fit the measured data set and its performance was averaged over 60 trials. Results for these three models are listed in Table 5.
Like the NN4 model's performance on the simulated data, the Calibrated NN model also satisfied the QMSC beamline's error threshold on the measured data. The largest residual for the six |S3| = 1 test cases was 0.005. The Calibrated NN model's prediction accuracy on the measured data set is included in Table 6.

Discussion
The results in Table 5 indicate how the Calibrated NN model outperforms the Limited NN model by roughly two orders of magnitude. This comparison demonstrates the advantage of applying transfer learning to a base model that was first thoroughly trained on simulated data when the measurement-based data set is small. The improvement of the Calibrated NN model upon the NN4 model is identified by the smaller errors in predictions. However, the MAPE for the Calibrated NN is larger than that for the NN4 model when evaluated against the measured data. This indicates that the transfer learning somewhat reduced the Calibrated NN model's accuracy in predicting the photon energy of the beam, despite improving the accuracy of the predicted Stokes parameters.
The similar performance characteristics between Tables 4 and 6 indicate that the Calibrated NN model has a comparable prediction accuracy on the measured data to the NN4 model on the simulated data. The near-unity regression scores in Table 6 indicate that the model is accurately predicting the beam characteristics and the small variances imply that the relative errors are small and closely distributed around zero.

Conclusion
The results of this work demonstrate the feasibility of generating a ML model to accurately predict the photon beam characteristics of a quasiperiodic EPU. More specifically, this work demonstrates the ability of a neural network to accurately model the complex, multi-parameter functions of an ID. This outcome was achieved by optimizing the neural network model to fit a large simulated data set. The importance of properly sampling the configuration space in the development of a neural network was also demonstrated by the E45 : L45 case.
Secondly, the successful application of transfer learning demonstrates how the neural network model was easily adapted to a measured data set. This stage was accomplished by building a separate neural network model, referred to as the Calibrated NN model, based on the NN4 model. This model was then trained on the limited magnetic measured Architecture of the Calibrated NN model. The NN4 model acts as the base model with four outputs and the additional layer is added on top of NN4 (shown in green). Table 5 Summarized performance of the neural networks on the measured data set.
The metrics in this table are computed on the scaled predicted values which lie in the range À 1 to 1. The Limited NN model was trained using bench-based magnetic measurement data. The NN4 model was trained using simulated data. The Calibrated NN used NN4 as a base model and was then updated using the measured data via transfer learning.
The regression scores are based on the scaled predicted values and the variance scores are computed from the unscaled predicted values. data set to provide more accurate predictions of the radiated light at the ID. The predictions produced by the Calibrated NN model satisfy the QMSC beamline's error threshold and the relative errors in predictions were shown to be within an acceptable threshold. The photon energy was predicted more accurately by the NN4 than the Calibrated NN model, as indicated by the slightly smaller averaged MAPE. However, this accuracy difference is small. The Calibrated NN model showed promising improvement in predicting the Stokes vector. The MSEV was determined to be 1.86 Â 10 À2 , indicating the predicted Stokes vectors closely agree with the test Stokes parameters. Thirdly, the deployment of this updated neural network model provides a synchrotron beamline with a fast-executing model for producing look-up tables and/or predicting single ID cases.
Future work for this project includes the development of an ML model that will predict the beam characteristics at the endstation by following a similar training and calibration approach. Polarization measurements will be acquired using a polarimeter located at the endstation and used to calibrate a neural network from this work. The completion of this work will provide users with an efficient tool for predicting the endstation beam characteristics for arbitrary ID configurations.